|
lxh623
初级用户
积分 34
发帖 30
注册 2008-11-24
状态 离线
|
『楼 主』:
[已结]文本内容提取
一个文件夹有许多个文本,想批量地提取一些文本内容。每个文本都比较大10M左右。
要求:(第一、第二主要说明任务,不表示顺序。)
第一,提取“UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US) ”[包含]和下一个“PATENT (Number; Kind; Date): ”[不包含]之间的。
第二,提取含有下面内容的行:
BASIC-PATENT:
PATENT (Number; Kind; Date): European Patent Office (EP)
PATENT (Number; Kind; Date): United States of America (US)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO)
PATENT (Number; Kind; Date): Canada (CA)
PATENT (Number; Kind; Date): People's Republic of China (CN)
PATENT (Number; Kind; Date): Japan (JP)
PATENT (Number; Kind; Date): Republic of Korea (KR)
PATENT (Number; Kind; Date): United Kingdom (GB)
PATENT (Number; Kind; Date): Germany (DE)
PATENT (Number; Kind; Date): France (FR)
PATENT (Number; Kind; Date): Russian Federation (RU)
文本部分示例如下:(一个文本可能有一百个这样的段落(以“BASIC-PATENT:”分隔的。)
BASIC-PATENT:
European Patent Office (EP) 277,004; A1; August 03, 1988
PATENT FAMILY
Number of Patents: 276
TAIWAN (TW)
PATENT (Number; Kind; Date): Taiwan (TW) 464,511; B; November 21, 2001
TITLE: Pressure-sensitive adhesive composition suitable for use in a transdermal drug delivery system and preparation method therefor
INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1994-178558; A; January 07, 1994
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): Taiwan (TW) 19958410044; A; January 19, 1995
INT-CL: A61K9/00 (Section A, Class 61, Sub-class K, Group 9, Sub-group 00)
A61K31/74 (Section A, Class 61, Sub-class K, Group 31, Sub-group 74)
ABST:
A blend of at least three polymers, including a soluble polyvinylpyrrolidone, in combination with a drug provides a pressure-sensitive adhesive composition for a transdermal drug delivery system in which the drug is delivered from the pressure-sensitive adhesive composition and through dermis when the pressure-sensitive adhesive composition is in contact with human skin. Soluble polyvinylpyrrolidone increases the solubility of drug without negatively affecting the adhesivity of the composition or the rate of drug delivery from the pressure-sensitive adhesive composition.
UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US) 5,958,446; A; September 28, 1999
TITLE: SOLUBILITY PARAMETER BASED DRUG DELIVERY SYSTEM AND METHOD FOR ALTERING DRUG SATURATION CONCENTRATION
INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1995-433754; A; May 04, 1995
United States of America (US) 1991-722342; A1; June 27, 1991
United States of America (US) 1989-295847; A2; January 11, 1989
United States of America (US) 1988-164482; A2; March 04, 1988
United States of America (US) 1991-671709; A2; April 02, 1991
World Intellectual Property Organisation (WO) 1990US9001750; W; March 28, 1990
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): United States of America (US) 1995433754; A; May 04, 1995
INT-CL: A61F13/02 (Section A, Class 61, Sub-class F, Group 13, Sub-group 02)
NAT-CL: 424448; X426449
EURO-CL: A61F13/02M; A61K9/70E; A61L15/18; A61L15/58; A61L15/58M+C08L33/00; A61L15/58M+C08L31/04
DERWENT NUMBER: C1989-106432; C1990-225696; C1991-230072; C1991-310376; C1993-036110; C1994-109332; C1995-044946; C1997-558092
CHEMICAL ABSTRACT NUMBER: 111(10)084137W; 114(04)030158X; 116(10)091389M; 118(16)154566F; 120(26)331144F; 128(15)184708C
ABST:
The method of adjusting the saturation concentration of a drug in a transdermal composition for application to the dermis, which comprises mixing polymers having differing solubility parameters, so as to modulate the delivery of the drug. This results in the ability to achieve a predetermined permeation rate of the drug into and through the dermis. In one embodiment, a dermal composition of the present invention comprises a drug, an acrylate polymer, and a polysiloxane. The dermal compositions can be produced by a variety of methods known in the preparation of drug-containing adhesive preparations, including the mixing of the polymers, drug, and additional ingredients in solution, followed by removal of the processing solvents. The method and composition of this invention permit selectable loading of the drug into the dermal formulation and adjustment of the delivery rate of the drug from the composition through the dermis, while maintaining acceptable shear, tack, and peel adhesive properties.
PATENT (Number; Kind; Date): United States of America (US) 5,300,291; A; April 05, 1994
TITLE: METHOD AND DEVICE FOR THE RELEASE OF DRUGS TO THE SKIN
INVENTOR: SABLOTSKY STEVEN, United States of America (US); GENTILE JOSEPH A, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1989-295847; A2; January 11, 1989
United States of America (US) 1988-164482; A2; March 04, 1988
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): United States of America (US) 1991671709; A; April 02, 1991
INT-CL: A61K31/74 (Section A, Class 61, Sub-class K, Group 31, Sub-group 74)
NAT-CL: 424 7818; X424485; X424484; X424448
DERWENT NUMBER: C89-106432; C90-225696; C91-230072
CHEMICAL ABSTRACT NUMBER: 111(10)084137W; 114(04)030158X
ABST:
A method of increasing the adhesiveness of a shaped pressure sensitive adhesive, comprising adding an adhesiveness and drug release increasing amount of a clay to said adhesive prior to casting of the adhesive. A dermal composition comprising a drug, a pressure sensitive adhesive, an adhesiveness increasing amount of a clay and a solvent. A dermal composition comprising a drug, a multipolymer of ethylene vinyl acetate, an acrylic polymer, a natural or synthetic rubber and a clay, along with optional ingredients known for use in transdermal drug delivery systems.
WORLD INTELLECTUAL PROPERTY ORGANISATION (WO)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO) 9,640,086; A3; February 13, 1997
TITLE: COMPOSITIONS AND METHODS FOR TOPICAL ADMINISTRATION OF PHARMACEUTICALLY ACTIVE AGENTS
INVENTOR: KANIOS DAVID P, United States of America (US); GENTILE JOSEPH A, United States of America (US); MANTELLE JUAN A, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1995-477361; A; June 07, 1995
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US); KANIOS DAVID P, United States of America (US); GENTILE JOSEPH A, United States of America (US); MANTELLE JUAN A, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
APPLICATION (Number; Kind; Date): World Intellectual Property Organisation (WO) 199608294; A; June 05, 1996
INT-CL: A61K9/70 (Section A, Class 61, Sub-class K, Group 9, Sub-group 70)
EURO-CL: A61K9/00M18D; A61K9/70E
DESIGNATED COUNTRIES: Albania (AL); Armenia (AM); Austria (AT); Australia (AU); Azerbaijan (AZ); Barbados (BB); Bulgaria (BG); Brazil (BR); Belarus (BY); Canada (CA); Switzerland (CH); People's Republic of China (CN); Czech Republic (CZ); Germany (DE); Denmark (DK); Estonia (EE); Spain (ES); Finland (FI); United Kingdom (GB); Georgia (GE); Hungary (HU); Israel (IL); Iceland (IS); Japan (JP); Kenya (KE); Kyrgyzstan (KG); Democratic Peoples Rep. of Korea (KP); Republic of Korea (KR); Kazakhstan (KZ); Sri Lanka (LK); Liberia (LR); Lesotho (LS); Lithuania (LT); Luxembourg (LU); Latvia (LV); Moldova, Republic of (MD); Madagascar (MG); former Yugoslav Republic of Macedonia (MK); Mongolia (MN); Malawi (MW); Mexico (MX); Norway (NO); New Zealand (NZ); Poland (PL); Portugal (PT); Romania (RO); Russian Federation (RU); Sudan (SD); Sweden (SE); Singapore (SG); Slovenia (SI); Slovakia (SK); Tajikistan (TJ); Turkmenistan (TM); Turkey (TR); Trinidad and Tobago (TT); Ukraine (UA); Uganda (UG); United States of America (US); Uzbekistan (UZ); Vietnam (VN); Armenia (AM); Azerbaijan (AZ); Belarus (BY); Kyrgyzstan (KG); Kazakhstan (KZ); Moldova, Republic of (MD); Russian Federation (RU); Tajikistan (TJ); Turkmenistan (TM)
DESIGNATED STATES REGISTERED PATENT: Kenya (KE); Lesotho (LS); Malawi (MW); Sudan (SD); Swaziland (SZ); Uganda (UG); Austria (AT); Belgium (BE); Switzerland (CH); Germany (DE); Denmark (DK); Spain (ES); Finland (FI); France (FR); United Kingdom (GB); Greece (GR); Ireland (IE); Italy (IT); Luxembourg (LU); Monaco (MC); Netherlands (NL); Portugal (PT); Sweden (SE); Burkina Faso (BF); Benin (BJ); Central African Empire (CF); Congo (CG); Ivory Coast (CI); Cameroon (CM); Gabon (GA)
LANGUAGE: English
DERWENT NUMBER: C1997-051830
CHEMICAL ABSTRACT NUMBER: 126(09)122452Q; 128(15)184708C
FILING DETAILS: WO 300000; Without international search report and to be republished upon receipt of that report;
ABST:
Compositions for topical application comprising a therapeutically effective amount of a pharmaceutical agent(s), a pharmaceutically acceptable bioadhesive carrier, a solvent for the pharmaceutical agent(s) in the carrier and a clay, and methods of administering the pharmaceutical agents to a mammal are disclosed.
谢谢您!
附件
简版http://www.namipan.com/d/1.txt/f ... 5e01d72bf7d77bb0000
全版:
http://www.namipan.com/d/sour.tx ... c2dc733dcb12471ce00
[ Last edited by lxh623 on 2009-4-13 at 23:30 ]
|
|
2009-4-11 22:11 |
|
|
freeants001
中级用户
积分 330
发帖 244
注册 2006-4-14 来自 湖北
状态 离线
|
『第
2 楼』:
FINDSTR [/B] [/E] [/L] [/R] [/S] [/I] [/X] [/V] [/N] [/M] [/O] [/F:file]
[/C:string] [/G:file] [/D:dir list] [/A:color attributes] [/OFF[LINE]]
strings [[drive:][path]filename[ ...]]
/B 在一行的开始配对模式。
/E 在一行的结尾配对模式。
/L 按字使用搜索字符串。
/R 将搜索字符串作为一般表达式使用。
/S 在当前目录和所有子目录中搜索
匹配文件。
/I 指定搜索不分大小写。
/X 打印完全匹配的行。
/V 只打印不包含匹配的行。
/N 在匹配的每行前打印行数。
/M 如果文件含有匹配项,只打印其文件名。
/O 在每个匹配行前打印字符偏移量。
/P 忽略有不可打印字符的文件。
/OFF[LINE] 不跳过带有脱机属性集的文件。
/A:attr 指定有十六进位数字的颜色属性。请见 "color /?"
/F:file 从指定文件读文件列表 (/ 代表控制台)。
/C:string 使用指定字符串作为文字搜索字符串。
/G:file 从指定的文件获得搜索字符串。 (/ 代表控制台)。
/D:dir 查找以分号为分隔符的目录列表
strings 要查找的文字。
[drive:][path]filename
指定要查找的文件。
除非参数有 /C 前缀,请使用空格隔开搜索字符串。
例如: 'FINDSTR "hello there" x.y' 在文件 x.y 中寻找 "hello" 或
"there" 。 'FINDSTR /C:"hello there" x.y' 文件 x.y 寻找
"hello there"。
一般表达式的快速参考:
. 通配符: 任何字符
* 重复: 以前字符或类别出现零或零以上次数
^ 行位置: 行的开始
$ 行位置: 行的终点
[class] 字符类别: 任何在字符集中的字符
[^class] 补字符类别: 任何不在字符集中的字符
[x-y] 范围: 在指定范围内的任何字符
\x Escape: 元字符 x 的文字用法
\<xyz 字位置: 字的开始
xyz\> 字位置: 字的结束
|
|
2009-4-11 22:33 |
|
|
yishanju
银牌会员
[b]看你妹啊[/b]
积分 1488
发帖 1357
注册 2006-5-20
状态 离线
|
『第
3 楼』:
最好能传一些文本上来,
看着晕得不行,不知道你要干什么
还有要得到怎样的格式
[ Last edited by yishanju on 2009-4-12 at 00:52 ]
|
有问题请发论坛或者自行搜索,再短消息问我的统统是SB |
|
2009-4-12 00:47 |
|
|
yishanju
银牌会员
[b]看你妹啊[/b]
积分 1488
发帖 1357
注册 2006-5-20
状态 离线
|
『第
4 楼』:
推荐用正则查找替换工具FR 来处理
可以方便的把其它杂质信息过滤掉,只得到想要的内容
|
有问题请发论坛或者自行搜索,再短消息问我的统统是SB |
|
2009-4-12 01:04 |
|
|
netbenton
银牌会员
批处理编程迷
积分 1916
发帖 752
注册 2008-12-28 来自 广西
状态 离线
|
『第
5 楼』:
@echo off&setlocal enabledelayedexpansion
set ho=UNITED STATES OF AMERICA (US)
set bg=PATENT (Number; Kind; Date): United States of America (US)
set en=PATENT (Number; Kind; Date):
set li1=PATENT (Number; Kind; Date):
set li2=BASIC-PATENT:
set "ver="
(for /f "delims=" %%a in (sour.txt) do (set "str=%%a"&call :sub %%a))>dest.txt
start dest.txt
pause
goto :eof
:sub
if defined ver (echo.!str!
if not "!str:%en%=!"=="!str!" set ver=
goto :eof)
if not "!str:%bg%=!"=="!str!" (set ver=y&echo !ho!&echo.!str!&goto :eof)
if not "!str:%li1%=!"=="!str!" echo !str!
if not "!str:%li2%=!"=="!str!" echo !str!
goto :eof
|
精简
[你的+我的+他的]=>[大家的] 个人网志 |
|
2009-4-12 01:53 |
|
|
freeants001
中级用户
积分 330
发帖 244
注册 2006-4-14 来自 湖北
状态 离线
|
『第
6 楼』:
复制保存为jsConvert.js
假定要转换的文件为a.txt
在命令行下输入: cscript /nologo jsConvert.js a.txt
转换后的文件为 a.txt__转换后.txt
File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject("scripting.filesystemobject");
var fl=fso.opentextfile(File_Path,1);sss=fl.readall();
fl=fso.opentextfile(File_Path+"_转换后.txt",2,true);
var re=/\r\nUNITED STATES OF AMERICA \(US\)[\s]*PATENT \(Number; Kind; Date\): United States of America \(US\)[\s\S]*?\r\nPATENT \(Number; Kind; Date\)\:.*|\r\nPATENT \(Number; Kind; Date\)\:.*/g
while ((arr=re.exec(sss))!=null)osss=osss+arr+"\r\n";
fl.write(osss);
|
|
2009-4-12 04:25 |
|
|
lxh623
初级用户
积分 34
发帖 30
注册 2008-11-24
状态 离线
|
『第
7 楼』:
Quote: | Originally posted by yishanju at 2009-4-12 00:47:
最好能传一些文本上来,
看着晕得不行,不知道你要干什么
还有要得到怎样的格式
[ Last edited by yishanju on 2009-4-12 at 00:52 ] |
|
新手不让上传附件,待会儿我上传。
|
|
2009-4-12 10:25 |
|
|
lxh623
初级用户
积分 34
发帖 30
注册 2008-11-24
状态 离线
|
『第
8 楼』:
Quote: | Originally posted by netbenton at 2009-4-12 01:53:
@echo off&setlocal enabledelayedexpansion
set ho=UNITED STATES OF AMERICA (US)
set bg=PATENT (Number; Kind; Date): United States of America (US)
set en=PATENT (Number; Kind; Date):
set li ... |
|
谢谢您!
1、我试了一下,到59节左右出现无处“此时不该有 〉”。
2、另外,PATENT (Number; Kind; Date):可不可以只取下面所述几种,像阿根廷之类,需要再去删除。当然,若是这样,也行。
3、怎么使得该批处理自动处理整个文件夹,原文编辑。
PATENT (Number; Kind; Date): European Patent Office (EP)
PATENT (Number; Kind; Date): United States of America (US)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO)
PATENT (Number; Kind; Date): Canada (CA)
PATENT (Number; Kind; Date): People's Republic of China (CN)
PATENT (Number; Kind; Date): Japan (JP)
PATENT (Number; Kind; Date): Republic of Korea (KR)
PATENT (Number; Kind; Date): United Kingdom (GB)
PATENT (Number; Kind; Date): Germany (DE)
PATENT (Number; Kind; Date): France (FR)
PATENT (Number; Kind; Date): Russian Federation (RU)
[ Last edited by lxh623 on 2009-4-12 at 10:51 ]
|
|
2009-4-12 10:30 |
|
|
lxh623
初级用户
积分 34
发帖 30
注册 2008-11-24
状态 离线
|
『第
9 楼』:
Quote: | Originally posted by freeants001 at 2009-4-12 04:25:
复制保存为jsConvert.js
假定要转换的文件为a.txt
在命令行下输入: cscript /nologo jsConvert.js a.txt
转换后的文件为 a.txt__转换后.txt
[code]File_Path=WScript.argum ... |
|
谢谢您!
本人愚钝,试了试,还是没搞出来。另外,有没有办法处理整个文件夹。
|
|
2009-4-12 10:32 |
|
|
freeants001
中级用户
积分 330
发帖 244
注册 2006-4-14 来自 湖北
状态 离线
|
『第
10 楼』:
Quote: | 1、我试了一下,到59节左右出现无处“此时不该有 〉”。 |
|
说详细些
另外是不是只以"PATENT (Number; Kind; Date): "开头的只保留下面这些国家?
Quote: | PATENT (Number; Kind; Date): European Patent Office (EP)
PATENT (Number; Kind; Date): United States of America (US)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO)
PATENT (Number; Kind; Date): Canada (CA)
PATENT (Number; Kind; Date): People's Republic of China (CN)
PATENT (Number; Kind; Date): Japan (JP)
PATENT (Number; Kind; Date): Republic of Korea (KR)
PATENT (Number; Kind; Date): United Kingdom (GB)
PATENT (Number; Kind; Date): Germany (DE)
PATENT (Number; Kind; Date): France (FR)
PATENT (Number; Kind; Date): Russian Federation (RU) |
|
|
|
2009-4-12 10:40 |
|
|
lxh623
初级用户
积分 34
发帖 30
注册 2008-11-24
状态 离线
|
『第
11 楼』:
Quote: | Originally posted by freeants001 at 2009-4-12 10:40:
说详细些
另外是不是只以"PATENT (Number; Kind; Date): "开头的只保留下面这些国家?
|
|
不好意思,八楼引用错误,是批处理,不是脚本。
脚本,我存为UNICODE,可以运行,但是数据比较乱。最好顺序不变。
速度很快,PATENT (Number; Kind; Date): United States of America (US)
与原文一样,出现1186次。
但是,basic patent本来100处,只出来6处。
批处理的问题:
第一、错了,“此时不应有〈 ”。
第二、是的,只保留这十来个格式。
待会儿上传文件。
[ Last edited by lxh623 on 2009-4-12 at 11:20 ]
|
|
2009-4-12 10:47 |
|
|
netbenton
银牌会员
批处理编程迷
积分 1916
发帖 752
注册 2008-12-28 来自 广西
状态 离线
|
『第
12 楼』:
你贴出来的部分我是测试通过了的,要整过都能通过,你还是传上来再说吧
|
精简
[你的+我的+他的]=>[大家的] 个人网志 |
|
2009-4-12 11:28 |
|
|
lxh623
初级用户
积分 34
发帖 30
注册 2008-11-24
状态 离线
|
『第
13 楼』:
Quote: | Originally posted by netbenton at 2009-4-12 11:28:
你贴出来的部分我是测试通过了的,要整过都能通过,你还是传上来再说吧 |
|
附件已上传,见顶楼下面。谢谢您们!
谢谢netbenton,最后测试,尽管仍然出来提示,但是很好,出来结果了。
花了十分钟。
1、请问,多文件处理可以吗?
2、十来个国家限定,行不行?或者,再麻烦您做一个批处理,可以批量删除文件夹内所有txt文件中不需要的一些行。比如文本a中输入不需要的:
PATENT (Number; Kind; Date): Austria (AT)
PATENT (Number; Kind; Date): Argentina (AR) 等等。
[ Last edited by lxh623 on 2009-4-12 at 12:00 ]
|
|
2009-4-12 11:37 |
|
|
netbenton
银牌会员
批处理编程迷
积分 1916
发帖 752
注册 2008-12-28 来自 广西
状态 离线
|
|
2009-4-12 12:18 |
|
|
netbenton
银牌会员
批处理编程迷
积分 1916
发帖 752
注册 2008-12-28 来自 广西
状态 离线
|
|
2009-4-12 12:39 |
|
|